北京邮电大学学报

  • EI核心期刊

北京邮电大学学报 ›› 2006, Vol. 29 ›› Issue (s2): 54-58.doi: 10.13190/jbupt.2006s2.54.309

• 论文 • 上一篇    下一篇

生物医学文本中命名实体识别的智能化方法

王浩畅1, 赵铁军1, 刘延力2, 于浩1   

  1. 1. 哈尔滨工业大学 计算机与技术学院, 哈尔滨 150001; 2. 辽河石油勘探局通信公司, 盘锦 124010
  • 收稿日期:2006-09-20 修回日期:1900-01-01 出版日期:2006-11-30 发布日期:2006-11-30
  • 通讯作者: 王浩畅

Intelligent Method for Name Entity Recognition from Biomedical Text

Wang Hao-chang1, Zhao Tie-jun1, Liu Yan-li2, Yu Hao1   

  1. 1. School of Computer Science and Technology, Harbin Institute of Technology, 150001, Harbin;
    2. Liaohe Petroleum Reconnoitering Bureau, Communication Corp, 124010, Panjin
  • Received:2006-09-20 Revised:1900-01-01 Online:2006-11-30 Published:2006-11-30
  • Contact: Wang Hao-chang

摘要:

介绍了使用机器学习方法进行生物医学文本命名实体识别的技术,包括Generalized Winnow算法、支持向量机方法和条件随机域模型。根据学习算法的特点,识别过程中使用了丰富的特征集,包括局部特征,全文特征及外部资源特征。各种类型特征的优化组合、识别结果的后处理包括缩写词识别和嵌套词识别以及边界校正等都提升了命名实体识别系统的性能。实验结果表明,通过上述策略的应用,系统取得了很好的识别结果。

关键词: 命名实体识别;特征选择;支持向量机, 条件随机域

Abstract:

These methods make extensive use of a diverse set of features, including local features, full text features and the features of external resources according to characteristic of algorithms. All the features are integrated effectively and efficiently into the recognition systems. Also the impact of different feature sets on the performance of the systems is evaluated. In order to improve the performance of systems, a post-processing module is added to deal with the abbreviation phenomena and the cascaded name entities as well as the identification of boundary errors. Evaluations of experimental results prove that the strategies of the feature selection and the post-processing modules have important contributions to better output of the systems.

Key words: name entity recognition, feature selection, support vector machine, conditional random fields

中图分类号: